Building A Lexical Domain Map From Text Corpora
نویسنده
چکیده
SUMMARY In information retrieval the task is to extract from the database ~dl ,and only the documents which are relevant to a user query, even when the query and the documents use little common vocabul~u'y. In this paper we discuss the problem of automatic generation of lexical relations between words ,and phrltses from large text corpora :rod their application to automatic query expansion ill information retrieval. Reported here ,are some preliminary resuhs and observations from the experiments with a 85 million word Wall Street Journal dalabase and a 45 million word San Jose Mercury News database (piu'ts of 0.5 billion word TIPSTER/TRECdatabàse).
منابع مشابه
Published vs. Postgraduate Writing in Applied Linguistics: The Case of Lexical Bundles
Abstract: Lexical bundles, as building blocks of coherent discourse, have been the subject of much research in the last two decades. While many of such studies have been mainly concerned with exploring variations in the use of these word sequences across different registers and disciplines, very few have addressed the use of some particular groups of lexical bundles within some gen...
متن کاملBuilding domain specific lexical hierarchies from corpora
In this article, we present a new algorithm for building domain specific lexical hierarchies from texts. The basic elements of such a hierarchy are the normalized terms – mono and multi-word terms – extracted from a large corpus by a terminological extractor. The algorithm relies on collocations for representing the meaning of these terms, finding hierarchical relations between them and finally...
متن کاملUser-Centered Analysis of Corpora Using Semantic Features Redundancy
Accessing textual information is still a complex activity when users have to browse through large corpora or long texts. In order to help users in such tasks, we propose a model dedicated to lexical representation of thematic domains as well as tools for personal corpora analysis. The lexical model is a differential one, inspired by Saussure's semiotics. It consists in structuring and describin...
متن کاملDeveloping Domain-Specific Gesture Recognizers for Smart Diagram Environments
Computer understanding of visual languages in pen-based environments requires a combination of lexical analysis in which the basic tokens are recognized from hand-drawn gestures and syntax analysis in which the structure is recognized. Typically, lexical analysis relies on statistical methods while syntax analysis utilizes grammars. The two stages are not independent: contextual information pro...
متن کاملMining Social Deliberation in Online Communication - If You Were Me and I Were You
Social deliberative skills are collaborative life-skills. These skills are crucial for communicating in any collaborative processes where participants have heterogeneous opinions and perspectives driven by different assumptions, beliefs, and goals. In this paper, we describe models using lexical, discourse, and gender demographic features to identify whether or not participants demonstrate soci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994